Komputilo

🆓🔧 FSlint = deduplicador de arquivos para GNU/🐧 Linux

O que é?

Ashish Kumar Sultania. Monitoring and Failure Recovery ofCloud-Managed Digital Signage. University of Tartu, Estônia, 2017.

The full backup often makes many redundant data copies, as datasets [... may not] have many changes between the backups. [...] Therefore, to eliminate redundancy, a space saving mechanism called Data deduplication can be used. It replaces multiple copies of data with references to a single compressed copy, thereby reducing the amount of needed space capacity. There are various tools available for data deduplication such as FSlint and fdupes. These tools scan directories to search duplicate files and provide actions such as to show, delete or replace the files with hardlinks. They work by comparing filesizes, MD5 signatures of partial file, MD5 signatures of the entire file, and then a cryptographic hash function of the entire file for verification. The comparisons of both these tools can be checked at [71] resulting that the FSlint is better in detecting duplicates.

Como funciona?

FSlint FAQ.

In summary the algorithm is:

  1. exclude files with unique lengths
  2. handle files that are hardlinked to each other
  3. exclude files with unique md5(first_4k(file))
  4. exclude files with unique md5(whole file)
  5. exclude files with unique sha1(whole file) (in case of MD5 collisions)

blog